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We consider perfect-information reachability stochastic games for 2 players on infinite graphs. We 
identify a subclass of such games, and prove two interesting properties of it: first, Player Max always 
has optimal strategies in games from this subclass, and second, these games are strongly determined. 
The subclass is defined by the property that the set of all values can only have one accumulation point 
- 0. Our results nicely mirror recent results for finitely-branching games, where, on the contrary, 
Player Min always has optimal strategies. However, our proof methods are substantially different, 
because the roles of the players are not symmetric. We also do not restrict the branching of the games. 
Finally, we apply our results in the context of recently studied One-Counter stochastic games. 



1 Introduction 

Two-player turn-based zero-sum stochastic games, simply called "games" in this text, evolve randomly in 
discrete transitions from one of countably many states to another. The winning condition is some prop- 
erty of such infinite evolutions. Each state is either owned by Player Max, Player Min, or it is stochastic, 
and has a fixed set, possibly infinite, of available outgoing transitions. The states and transitions define a 
game graph, an infinite path in this graph is called a run. The set of runs comes with a product topology 
over the discrete state space, i.e., open sets are generated by sets of runs sharing a common finite prefix. 
In stochastic states, the successor is sampled according to a fixed distribution, whereas players choose 
successors in states they own, based on the history of the play so far. This induces a probabilistic measure 
for Borel-measurable sets of runs in a natural way. 

A winning condition is a set W of runs. A run from W is won by Player Max, the other runs are won 
by Player Min (the games are zero-sum). For Borel measurable sets W, a fixed pair (a, 7i) of strategies 
for Player Max and Min, respectively, and an initial state, s, the probability that Max wins is denoted by 
Pf ' n \W]. The value of the game in s, denoted by Val{s), is defined as 

Val(s) := supinfPf <*[W] = inf supPf ' n [W] . (1) 
a n % a 

The above equality, a consequence of a more general, Blackwell-determinacy result of Martin frPZl . 
implies that for every e > both of the players have so called e-optimal strategies, a e and 7T e , such that 
mfjt Pf E,7t [W] > Val(s) - e, and sup CT Pf > Ke \W] < Val(s) + e. This may not be true for the case when e = 0, 
where the optimal (i.e., 0-optimal) strategies may not exist for neither of the players. 

We consider a stronger notion of determinacy than (JT), and call a game strongly determined if for 
every state s, every V, < V < 1, and \> € {>, >} either Player Max has a strategy a such that Vft : 
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Ff^[W] t> V, or Player Min has a strategy ft such that Va : Pf'*[W] ^ V. Denote L := sup^inf^Pf ,7r [W] 
and R := inf w sup CT Pf ,Z [W], then if Max has a strategy a such that V7T : ¥f ,n \W] > V then v <L. Similarly, 
if Player Min has a strategy % such that Va : Pf ' K [W] < V then v > /?. By strong determinacy, Vv : ->(R > 

V > L), thus R < L. L < R follows from definitions, thus strong determinacy implies determinacy. On 
the other hand, it is easy to see that the existence of £-optimal strategies for both players implies strong 
determinacy for cases where |v — V&Z(j)| > 2e, the players simply use their £ -optimal strategies to win. 
This works even for e = 0, thus whenever both players have optimal strategies, the game is strongly 
determined (for all v). To sum up the relation between the key three notions: Every game with a Borel 
winning condition is determined in the sense of dH), some of these games are strongly determined, and 
some of the strongly determined games are those admitting optimal strategies for both players. Example [T] 
and (7J Fig. 1] show that both the inclusions are proper. More precisely, in the game from Example [JJ 
which we show later, Player Min has only one (trivial) strategy, thus the game is strongly determined. 
However, there is a state ro, such that for every fixed strategy of Max the probability of winning is 
strictly below Val(ro). The game from JT) Fig. 1], is composed of two halves, one of which is essentially 
equivalent to the game in Example [TJ and the other is a similar game adopted for Min (infinite branching 
needed). As a consequence, neither Player Max in the first half, nor Min in the second half have optimal 
strategies. Thus, fixing a strategy of one player first, which is £-optimal, the other player may choose an 
£/2-optimal strategy to beat the first player. As a consequence, no player has a winning strategy. 

We are especially interested in the situation when W is an open set, and call such games open as 
well. This includes all reachability conditions, where W is the set of all runs visiting a state from a 
distinguished set of target states, T. For reachability, results of J7] |6 1 imply (see Corollary [T) that Player 
Min has always optimal strategies if every state, s, owned by Min has at least one successor, t, such that 
Val{s) = Val(t). This is always the case in finitely-branching games, where all states have only finite 
number of successors. On the other hand, even in very simple reachability games where every state has 
at most 2 successors, Player Max may not have an optimal strategy (cf. Example [JJ). Our main result 
gives a condition sufficient for the existence of optimal strategies for Player Max. 

Theorem 1. Let be an open stochastic game. Player Max has an optimal strategy in all states, if 

the set V e := {Val(s) \s is a state of^ A Val{s) > e} is finite for every £ > 0. (*) 

In particular, 'S is not assumed to be finitely-branching. Condition © is just saying that the set 

V := {Val(s) | s is a state} has no accumulation points, or the only such point is 0. It is a trivial task 
to construct a game where none of the players owns a single state, i.e., a Markov chain, and where the 
set V contains other accumulation points than 0. In Markov chains, however, each player has only one, 
trivial, strategy, which must thus be the optimal one. This shows that (0) is not necessary. However, 
there are at least two reasons for which fl*} is interesting: First, we identify a class of recently studied 
infinite-state stochastic games which satisfy the assumption of Theorem [T] and for which the existence 
of optimal strategies for Max was not known before. This class, properly described later, consists of 
games generated by One-Counter automata E|2]|4l, which satisfy a certain additional property, which 
can be tested algorithmically. As a special case, this class involves a maximizing variant of Solvency 
Games ffl. 

Second, in Examples Q] and |2j we show games where Player Max lacks optimal strategies. These 
games are rather simple, and violate © only "very slightly", in particular, they (1) are finitely-branching, 
and in fact have both the out-degree and in-degree of the game graph bounded by 2, (2) do not contain 
states of Player Min at all, (3) all transition probabilities in stochastic states are uniformly distributed, 
and (4) V has only one accumulation point. This point is 1 in Example [TJ and 1/2 in Example |2 In the 
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latter case, the accumulation point is approached only from above, and V H [0, 1 /2) = {0}. Thus it is not 
possible to weaken the assumption (0) in Theorem [T]by allowing other accumulation points than 0. 

As noted before, both players having optimal strategies implies strong determinacy. But even for 
finitely-branching reachability games strong determinacy still holds, although Player Max may not have 
optimal strategies, and only Player Min always does @. Interestingly, we show here that under fl*}, 
where Max has optimal strategies, and Min may not have such, strong determinacy survives. 
Theorem 2. Let be an open stochastic game satisfying ([*p. Then <S is strongly determined. 

Related work and open questions. Blackwell games are more general than our stochastic games, 
players there choose their moves simultaneously, not knowing the concurrent choice of the opponent. A 
famous determinacy result in the sense of CD for Blackwell games is given in lfT2l . Finitely-branching 
reachability games have been studied as a theoretical background for some algorithmic results concern- 
ing BPA games (i.e., games with graphs generated by stateless pushdown automata) in QUI. Finite-state 
reachability stochastic games were studied in [8 ]. In view of existence of optimal strategies and strong 
determinacy, finite-state games are not interesting: optimal strategies always exist there. However, the 
precise complexity of associated computational problems for these games is a long-standing and inter- 
esting open problem. 

Theorem |2] and the results from E |6] give us two classes of strongly determined games: games 
satisfying (©, and finitely-branching games, respectively. Neither of these two classes is contained in 
the other. The most interesting question in our opinion is whether the following conjecture is true; and if 
it is not, for which, as weak as possible, restrictions on W and/or it becomes true. 
Conjecture 1. Let <S be a stochastic game, and W a winning condition, such that Player Max ( or Player 
Min) has an optimal strategy in every state of^. Then <S is strongly determined. 

We do not even know whether the conjecture is true for all games where W is a reachability condi- 
tion. Other open questions include finding new interesting classes of games where one of the players is 
guaranteed to have optimal strategies. 

Outline of the paper. We briefly formalise the necessary notions, and recall some important known 
facts in Section [2 In Section [3] we prove Theorem Q] in the special case of games without Player Min. 
Both theorems are then proved in full generality in Section SJ Finally, in Section [5] we briefly explain 
what are One Counter games, and apply our results to them. 

2 Preliminaries 

As noted in the Introduction, we use the simple term "games" for our special kind of games (Definition EJ. 
Because we do not speak about other games here, we hope the reader will excuse us for this inaccuracy. 
Definition 1. A game graph, G = (5, — >, 8), has a countable set S of states, partitioned into sets So, S\, 
S2 of stochastic states, states of Player Max, and Player Min, respectively; a countable transition relation 
— > C S x S such that Vr e S : 3j e S : r— >s; and a probability weight function 5 : So x S — > [0, 1] such 
that for all r € So we have £ r ^ 4 8 (r, s) = 1 . 

A run is an infinite path in a game graph. For a finite path w, we denote the states it visits by 
w(0), w(l), . . . ,w(k), and call k = len(w) the length of w. Run(w) is the set of all runs extending w. 
Unions of sets of the form Run(w) are called open sets, they are open in the product topology over the 
discrete spaces S. Closing the set of open sets under complements and countable union defines the set of 
(Borel-)measurable sets. 
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Definition 2. A game, Sf , is given by a game graph, G, and a Borel-measurable set of runs, W, called the 
winning condition. If there is some T C 5 so that W = \J{Run(w) \ w ends in T} then IV is a reachability 
condition, and Sf is called a reachability game. 

A strategy for Player Max is a function assigning to every finite path (called a history) ending in a 
state s € Si a distribution over the successors of 5. Similarly, a strategy for Min is defined for histories 
ending in 52. A strategy is memory less, if it only depends on the last state of the history. 

Fixing a pair of strategies, (o,7t), for Max and Min, respectively, we assign to every finite path, w, 
the product, p a,K (w), of weights on the edges along w given by 8, a, and %. Fixing also an initial state, 
s, we define a probability measure Pf by Ff ,7l [Run(w)] '■= for w not starting in s, Pf ,7C [Run(w)] := 
p c ' n (w) for w starting in s, and extending this to complement and union to satisfy the axioms of a 
probability measure. The uniqueness of this construction is a standard fact, see, e.g., lPT3l p. 30]. 

The definition of the value, Val(-), given in (Q}, has thus been formalised. For e > 0, a strategy, a, 
for Max is £-optimal in a state s if Pf ' n [W] > Val(s) — e for all strategies, %, for Min. The £-optimal 
strategies for Min are defined analogously. We call 0-optimal strategies just optimal. 

2.1 Technical Assumptions 

Although a game graph, in general, may have an arbitrary structure, we can always transform it to be a 
forest, without changing the properties of the game, by keeping track of the history inside the states. More 
precisely, given a game Sf = (G,W), G=(S,->, 8), consider a game Sf' = (G',W'), G' = (5', ^ , 8'), 
where the states in 5' are just finite sequences of states from 5. In particular, 5 C 5', and whenever r — >s 
in Sf then wr^-wrs in < S' ' . Projecting the states of 5' to their last component induces a map, <p, from 
paths in G' to paths in G. We set W' := (W). The map <p also induces a map, <1>, from strategies in Sf 
to strategies in ( S' , by sending histories through (p. Naturally, the partition of 5', and the weight function 
8' are both derived from 5 and 8 by projecting states from 5' to the last component. 

It is easy to verify that for every s € 5, if we restrict the game graphs of Sf and Sf ' to states reachable 
from s, then <p is clearly bijective and preserves measurability in both directions. Also <I> is bijective, 
and for all measurable A C Run(s), and all pairs (a,7t) of strategies: Pf ,7t [A] = Pf (CT),l&(7t) [0(A)] . As a 
consequence, Val(s) is the same in Sf and Sf ' for all s € 5, and the sets of all values in Sf and in Sf ' are 
equal. Also, W is open iff W' is a reachability condition. Every strategy in Sf ' is memoryless, because G' 
is a forest. Finally, once we have a reachability objective, with the target set T, we may clearly assume 
without loss of generality, that all states in T are absorbing. This shows that to prove Theorems [T] and |2] 
we may safely assume the following: 

Assumption 1. The game graph is always a forest, all strategies are memoryless, and the winning 
condition is a reachability condition specified by some target set T C 5, such that for all t € T the only 
edge leaving tist—> t. 

2.2 Known Results for Reachability Games 

We state here some known results to be used later. The following gives a characterisation of values, and 
allows us to characterise the existence of optimal strategies for Min. 

Fact 1 (cf. 13 Theorem 3.1]). Let Sf = (G,W), G = (5, -> ,8) be a game, with W = \J{Run{w) \ 
w ends in T}. The least fixed point of the following (Bellman) functional V : (5 — > [0, 1]) — > (5 — > [0, 1]) 
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Figure 1: A reachability game where Player Max (□ states) has no optimal strategy. 



exists and is equal to Val(-). 



' 1 ifs£T 

sup{/(r)| *-►/•} ifseS^T 

M{f(r)\s^r} ifseS 2 \T 

L s ^ r 5(s,r)-f(r) ifs€S \T 



Corollary 1 (cf. (7J Theorem 3.1]). Let ^ be a game as in Fact[l\ Let G' = (S, be a subgraph 

of G where is a subset of — >, and if there is a pair r,s G S such that r—^s and r^s then r G S2 and 
there is some s' G S such that r^s' and Val{s') < Val(s) in <S . Let <S' = (G', W). Then the values are the 
same in §f and ( S' . 

As a consequence, a strategy, K, for Min is optimal iff for all r G S2 it chooses with positive probability 
only successors s G S satisfying Val(r) = Val(s). 

Proof. Let be the Bellman functional associated with <$' . Observe that the values in <£ form a fixed 
point of y , thus for all s G S, Val(s) in ( S' is equal to or less than Val(s) in C S' . Moreover, it cannot be 
less, because Player Max has the same set of strategies in <$' as in , whereas Player Min does not get 
more strategies in ( S' . To derive the consequence, remove all edges not used by %. □ 

Note that the situation is not symmetric for Player Max. Consider games without Player Min, and 
with out-degree and in-degree bounded by 2. In particular, this implies that every state, r, of Player Max 
has at least one successor, s, with Val(r) = Val{s). Even in these games, Player Max may lack optimal 
strategies, as illustrated in the following classical (see, e.g., (U p. 871], (5] Example 6]) example. 

Example 1. Consider the reachability game from Figured] Its game graph, G, has the set S := {r,-,5,-,f ; - | 
i > 0} of states, partitioned by So = {si,ti \ i > 0}, Si = {r,- | i > 0}, and S2 = 0. Transitions are so — >so, 
to^-to, and r,_i — »r;, r;— >si, Si— >-j,-_i, st— >tt, and t\— for i > 0. Probabilities are always uniform. 
The target set is T = {?o}- Clearly, Val(sj) = 1 — 2~' for all i > 0. Thus Val(ri) = 1 for all i > 0: for every 
N > 0, choosing the transition r, ^r !+ i for i < N, and the transition rj^-Sj for i > N, is a 2 _A, -optimal 
strategy for Max. Yet Max has no optimal strategy in any r\, i > 0: no strategy reaching some s; is 
optimal, and, on the other hand, never reaching sj means never reaching t. 

3 Games without Player Min 

Proposition 1. Let <$ = (G, W) be a stochastic game, where G = (S, — > , 5) and S2 = ®,^and W is open. 
If & from Theorem\J\is satisfied then Player Max has an optimal strategy in all states. 



These are also sometimes called (minimizing) Markov Decision Processes (MDPs), see, e.g., t3l[2l ll3l . 
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We fix the game W from Proposition \T\ in the rest of this section, devoted to proving the proposition. 
By Assumption [T] G is a forest, and there is T C 5 such that W = \J{Run(w) \ w ends in T} and for all 
t G T there is only one transition: t—tt. The proof is by contradiction, in three steps. First, we prove that 
if there is a state with no optimal strategy, then there must be a state from which winning with probability 
sufficiently close to the optimum implies the need to use some value decreasing transition. A transition 
r— is value decreasing if Val(r) > Val(s). Second, we will argue that the potential "damage" caused 
by this transition is positive and bounded away from 0, independently of the actual strategy. Third, we 
show that (0) implies that the potential "damage" factor is indeed bounding the probability of reaching 
T away from the value, which is a contradiction with the definition of the value. 

We introduce a random variable, L (for "loss"). For a run, ft), a losing index is every i, such that 
co(i) G Si and Val((o(i)) > Val((o(i+ 1)). If there is no losing index for ft), we set L((o) '■= 0. Otherwise, 
there is the least losing index, i, and we set L(ffl) := Val(co(i)) > 0. Finally, we say that a state s G 5 is 
losing if there is some S s > such that for every 6* s -optimal strategy, a, in s, we have Pf [L > 0] > 0. 

Lemma 1. Assume (0- If 3s G S such that Va : Pf [W] < Val(s) then there is also some losing state. 

Proof. By contradiction. Assume there is no losing state, we construct an optimal strategy in every state. 
Define a subset <— > of the transition relation — > of 'S , by setting for every pair r,s <ES: r^-s iff r— >s and 
either r G So, or Val(r) = Val(s). Observe that @) implies that for all r E Si there is at least one s such 
that r^-s and Val(r) = Val(s). Thus is total and G' = (5, , 8) is a game graph. Without losing 
states, for every r G S and every e > there is some £-optimal strategy, a, such that Pf [L > 0] = 0, i.e., 
a does not use value-decreasing transitions. This strategy works in ( S' = (G',W) as well, winning with 
the same probability, as in <$ . The values in and 'S' are thus the same. 

Consider now C S' . Denote by FPk(s) the set of all finite paths of length k starting in s. Due to the last 
sentence in Assumption [T] and because preserves value, the following is true in ( S'\ 

Vk>0:Vo:VseS:Val(s)= £ Pf[Run(w)]-Val(w(k)). (2) 

weFP k (s) 

For all s G S fix a 1/4 • Va/(5 , )-optimal strategy a s . After some n s > of steps, T must be reached from s 
under a s with probability at least Val(s)/2, as Pf«[W] = lim^cPf [{Run(w) \ len{w) <kAw(k) G T}} . 

For all s G S we finally construct a strategy a for Sf', optimal in 5. Because the values are the same 
in ^ and W, and every strategy for <$' is also a strategy for <S , this will finish the proof of the lemma. 
The strategy a starts in s according to a s , and follows it for n s steps. After that, having arrived to some 
state r, it switches to o r and follows it for other n r steps. This is repeated ad infinitum. The invariant (j2), 
and the choice of n r and o r for r £ S, guarantee that after the ra-th stage of the above repetitive process, 
T has actually been reached with probability (1 — 2~ m ) ■ Val(s), proving that a is optimal. □ 

For every losing state, s G S, and every constant e > we define l e s := inf{Ef [L] \ G is e-optimal in s}. 
Since I* < I] < 1 for e > t,, the limit l s := lim e ^o^f exists. 
Lemma 2. Assume (tip. For every losing state, s, in we /jave £ 4 > 0. 

Proof. By contradiction. Assume that s is losing and l s = 0. To every strategy a which may possibly 
use value-decreasing transitions r— >r* where Val(r) > Val(r') we consider a strategy a, which copies 
the moves of a until a value-decreasing transition is chosen. From that point on, just before the value- 
decreasing transition, the strategy a keeps choosing arbitrary successors with the only requirement that 
they preserve the value, i.e., whenever a chooses a transition s— >s' with a positive probability, Val(s) = 
Val(s'). Such a choice always exists, because sup^y Val(s') = Val(s), and either Val(s) = 0, in which 
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case Val(s') = for all s', s—ts', or Val(s) > 0, and by © Val(s) > cannot be an accumulation point, 
so there is some s', s^s' with Val{s) = Val{s'). Observe that for every a, Pf[W] -Pf [W] < E s a [L] . As 
a consequence, due to l s = 0, Val(s) = sup{Pf \W] \ a is some strategy}. This contradicts s being losing, 
since Pf [L > 0] = for every a. □ 

Proof of Proposition^ By contradiction. Assume ©, and that there is some r G S with no strategy 
optimal in r. By Lemma[TJ there is a losing state, s G S. By Lemma|2l i s > 0. Choose some e > such 
that t e s > 4/2 > 0. Thus under every £-optimal strategy, a, with some positive probability, p > 0, a state 
r € Si with VaZ(r) > ^ is visited, and some transition r^->V with VfoZ(r') < Val(r) is taken. Observe that 
© gives us the following "value-gap": 

8 := inf{| VaZ(r) - Val(r')\ \ rj e S, Val(r) + Val(r'),Val(r) > £ e s } > 0. 

This allows us to bound p independently of a, since t e s < E^[L] < p- 1 + (1 — p){t e s — 8) and hence 

Thus for every strategy, a, we have that Val(s) — Pf [W] > min{£, 8 p} > 0. This clearly contradicts the 
definition of Val(s). The proof is finished. □ 



4 Reachability Games 

In this section we prove Theorems [T]and [2] Let us fix a game ^ = (G,W), where G = (S, — >,8), 
satisfying AssumptionQ] Also assume that W is open, and thus there is T C S such that W = \J{Run(w) \ 
w ends in T}. We call a state s safe if Va for Max : 37i a for Min : Pf ,7t<7 [lV] = 0. The following lemma 
states the strong determinacy restricted to states with value 0, and will be useful in proving each of both 
theorems. 

Lemma 3. IfW satisfies (t*J then for every safe s € S: 3n for Min : Va for Max : Pf ,7t [W] = 0. 

Proof. We cut off some choices for Min in the game graph G of & ', and obtain its sub-graph G', so that all 
states reachable in G' from s have value in < S' = (G',W). In particular, no run can satisfy W. Because 
the choices of Max remain unrestricted in G', this ensures that the probability of W is in W as well. Let 
us proceed in more detail. 

Observe that every safe state has value 0, so no safe state is in T. Also, observe that for every safe 
r G So U Si and s S S, if r^-s then s is safe. Likewise, if r G S2 is safe, then there must be a safe s such 
that r—^s. Fix a safe s, and define G' as the smallest sub-graph of G containing s and satisfying that if r 
is in G ', then so is every safe successor r' of r in G. As shown above, G' is a game graph, the probability 
assignment 8 from G is valid in G' as well, and all states in G' are safe. Hence, no paths in G' visit T , and 
the value of every state in ( S' is 0. Fix an arbitrary strategy n for Min in C S' = (G',W), then Pf ,7l [W] = 
for all a of Max in All transitions out of safe states of Max were preserved in G', and n is also a 
strategy in ^ ', so we have Pf ' n \W] = also for every a of Max in W. □ 

4.1 Proof of Theorem U 

Lemma 4. IfW satisfies (0, then for all s G S we have: Vnfor Min : 3a for Max : Pf ' K [W] > Val(s) . 
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Proof. For every (memoryless, due to Assumption [[} strategy % of Player Min, we denote by & x the game 
where the choices of Player Min are resolved using %. Formally, <$ n = (G',W), where G' = (S' , , 8'), 
and (1) S' = S but comes with a different partition: S' = S0US2, S[ = Si , S' 2 = 0, (2) the relation «-)■ C — >■ 
is given by iff r— >-j and either r £ S0US1, orr G S2 and 7r(r)(*) > 0, and (3) 5' = 5 U %. For every 
strategy a for Player Max, and every s G 5 the measure Pf ,7t [-] in £f obviously coincides with Pf [•] in 
Thus we may apply Proposition Q] to all C S J[ to derive the lemma. □ 

Consider now the following game M 1 = (H,W), which is a slight modification of Sf. The set of 
states ofH = (S,'—^, 8h) is S, the same as in G, and with the same partition. There is a transition r^s 
iff exactly one of these three situations occurs: Val(r) = in and s = r; or Va/(r) > 0, r G So and r— 
or Val{r) > 0, r ^ So, r— > j, and Val{f) = Val(s) in ^. In other words, in H we made all states with value 
absorbing, and only left value preserving transitions for players. Finally, 8h is the only probability 
weight function which coincides with 8 on stochastic states with positive value. 

Lemma S.If^ satisfies ([*]), then H is a game graph, and the values are the same in and J^tf. 

Proof. We refine the modifications from above into three steps, obtaining game graphs Hq = G,H\, H2, 
and H3 = H. We will show for each i G {1,2,3} that Hj is a game graph, and that the values are the same 
in Jtifj = (Hi,W) as they are in . All the graphs constructed have the same set of states, S, and the same 
partition, as G, and the same weight function, 8h, as H. 

H\ = (£,i-», 8h), and n->s iff Val[r) = in <S , and s = r, or Val[r) > and r— >s. Hi is clearly a 
game graph, because (->• is total. The values did not change, because each absorbing loop outside of T has 
value 0. Moreover, every r G S2 has always a successor with the same value. Indeed, if Val{r) = then 
r itself is its own successor in Gf, if Val(r) > then inf rh ^. s Val(s) = Val(r), and by ©, since Val(r) > 
cannot be an accumulation point, there is some s, rt-ts with Val(r) = Val(s). By Corollary Q] Min has 
optimal strateg ies in J^f. 

H2 = (S, and r-ws iff r*->s and either VaZ(r) = in Sf, or r ^ S2, or (if Va/(r) > and 

r G S2) Val[r) = VfoZ(j) in . Because Min has always value-preserving transitions in 3%[, H2 is clearly 
a game graph, and by Corollary [T]all strategies of Min in M2 are optimal. Fix one such n for Min, and 
an arbitrary s G 5. By Lemma [4] there is a a for Max in §f (and thus also in J%2 = {H%,W)) such that 
P.f ,7l [W] > Val(s). Because n is optimal, a cannot choose value-decreasing transitions. Thus, even when 
only using edges in ^ , i.e., from H3 = H, we still obtain that inf^ sup CT Pf ' n \W] = Val(s). Thus also the 
graph H is a game graph, and the values in J$? and are the same. □ 

Lemma 6.1f& satisfies (t*J, then Player Max has an optimal strategy, o, in 

Proof. We first describe a, then we prove that it is optimal. In every state, s, there is some 1 /2 • Val(s)- 
optimal strategy, z s , for Max. We call a history (i.e., a finite path), w, starting in some state s, and ending 
in r, lazy, if Val(r) > and inf w Pj I,JC [W | Run{w)\ = 0. Observe that each history, w, can be uniquely split 
into a sequence of sub-paths, divided by single states, w = sqWqsiWiS2 ■ --SkWk, k > 1, Sj G S, Wi G 5*, 
such that for all i < k, SiWjSj + i is lazy, and for all i < k, SjWj is not lazy. We call k the laziness index of w, 
written laz{w) and s^Wk the non-lazy suffix of w. We now define a for a history w with a non-lazy suffix 
s k Wk by a(w):= r Sk {s k w k ). 

Now we prove that a is optimal. To do so, we need to extend the laziness index to runs. For a run, ft), 
we set Laz(co) := sup{/az(w) | ft) G Run(w)} G NU {°°}. Thus we defined a random variable, Laz. We 
prove the following claim, which clearly implies the statement of the lemma: 



\fs G 5 : V7T for Min : Vk > : Pf ' n [W A Laz <k]> Val(s) • (1 - 2~ k ). 



(3) 
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By induction on k. Fix some s £ S, and a strategy, %, for Min. Clearly, © is true for k = 0. Also it is 
true when Val(s) = 0. Assume thus Val(s) > and k = t + 1 for some I > 0. We set L to be the set of all 
finite paths, w, such that laz(w) = k and the non-lazy suffix only consists of one state. Denote by last[yv) 
the last state of w. Observe that, by the definition of a and T Sk , 

Vw £ L : V7T for Min : Pf' n [W \ Run(w)} > 1/2 ■ Val{last(w)). (4) 

Let A be any prefix-free set of finite paths such that Pf ,7l [{J weA Run(w)] = 1. Because only contains 
value-preserving edges for players, we have 

Val(s) = £ Pf' n [Run(w)] ■ Val(last(w)). (5) 

We have := Pf' K [W ALaz < £] > Val(s) ■ (1 — 2~ £ ), by the inductive hypothesis. We also have g := 
ZweL^' n [Run{w)\ ■ Val(last(w)) = Val(s)-p, by ©. By ©, Pf' % [W ALaz = k]>q-\/2. Finally, 

P?'*[W A Laz <k]= Ff' n [W A Laz <£}+ ¥f' n [W A Laz = k] 

= p + q- 1/2 = p+ (Val(s) - p) ■ 1/2 = p/2 + VW(j) /2 

> (2- 1 -2-( f+1 ))-Va/(5) + Vh/(5)-2- 1 = (l-2-^ +1 ))-Va/(5). 

□ 

Proof of TheoremUl Consider the strategy a from Lemma [6] It partially defines a strategy in 5f. To 
complete its definition, we now specify it for histories containing a transition of the form r— >s, where 
r € S2 and Val{s) > Val(r), by requiring a to behave as a 1/2 • (Val(s) — Val(r)) -optimal strategy since 
that point. Fix an initial state, s, and consider an arbitrary strategy, 7t, of Min. If % is optimal, then it is 
also valid in and Pf ' K [W] = Val(s) by Lemmata [5] and [6] For a non-optimal % it is easy to verify that 
Pf' K [W] > Val(s) by both the definition of a, and Lemmata[5]and[6] □ 

4.2 Proof of Theorem H 

If both players have optimal strategies, the game is strongly determined. However, even under Condi- 
tion (©, Player Min may not always have an optimal strategy, because of states with value 0, without 
value-preserving transition for Min available. See the game in J7] Fig. 1] restricted to states reachable 
from s, for an example. Theorem |2]is a direct consequence of Lemma |7]and Lemma|9j where the former 
lemma deals with all "easy cases", and the latter "patches" the above deficiency by using Lemma [3] to 
deal with states with value 0, and "restoring" the optimal strategies for both players in the rest. 

Lemma 7. Assume that <S satisfies (@. Let s £ S, < V < 1, and i> £ {>,>}■ Assume that either 
Val(s) = 0, or V 7^ Val(s), or > =>. Then either Player Max has a strategy G such thatMn : Pf ,7l [W] > V, 
or Player Min has a strategy % such that Vff : P s ' [W] p£ V. 

Proof. The case when v = Val(s) = is solved by Lemma [3] If V < Val{s), we can choose any 1/2 • 
(Val(s) — v)-optimal strategy for Max as a. Similarly, if v > Val(s), we can choose any 1/2 • (v — Val(s))- 
optimal strategy for Min as %. If v = Val(s) and t> =>, we can choose any optimal strategy for Max as 
a. Such a strategy exists due to Theorem [T] □ 
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It remains to solve v = Val(s) > and D> =>. We do two preprocessing steps on 'S to first obtain 
and then J>f. In Jtf both players will have optimal strategies, and we will be able to lift such a strategy 
for Min back to iff Max does not have a strategy ing Sf to always win with probability > Val{s). 

We fix s G S with Val(s) > and set R := {r G S \ Val(r) = A 3d : V7T : Pf ,Jr [W] > 0}. Intuitively, if 
Max does not have a strategy to always win with probability > Val(s), then Min can always respond to 
a strategy a of Max with a n a , so that R is not visited at all from s under these strategies, and yet Max 
wins with probability at most Val(s). Thus, if we cut off all states from R, producing the game Jtf, we 
obtain a valid game graph, and the values of states will not change. 

Before we describe this formally, we observe that neither of the players benefits from using transi- 
tions which do not preserve the value. Let W = (G',W), G' = (S, i-> , 8) be a game given by restricting 
the edges of G to value-preserving where possible: for all r,r' G S we require that rt-tr 1 iff r— >r' and 
either r G S UR, or Val(r) = Val(r'). 

Lemma 8. Assume that satisfies (0. Then the values in £f and in < S' are the same, and for all s G S, 
each of the following is true in C S' if it is true in <S: 

Mofor Max : 3x a for Min : Pf •*» [W] < Val(s) , (6) 
Vrcfor Min : 3^ for Max : F?*'*[W] > Val(s) . (7) 

Proof. By Theorem [T] there is an optimal strategy, a, for Min. This is also a strategy for W, thus for 
all s G S, Val{s) in C S ! is at least Val(s) in <S . On the other hand, by Corollary Q] cutting off non-optimal 
edges leaving states from &i\R does not alter the values. Further, cutting off non-optimal edges from Si 
could only decrease the values. Thus, for all s G S, the values in <3' and are equal. 

Now we fix some s G S, and prove that if © is true in then it is true in ( S' . Let a be a strategy 
for Max in C S' , i.e., it is a strategy for 5f which does not use value-decreasing edges. If a is optimal, 
then the strategy % a from © in necessarily has to use value-preserving edges everywhere, and thus 
it is valid in C S' as well. If a is not optimal, consider again the response % a of Min to satisfy © in 5f . 
If % a cannot be used directly in W, then there must be some r G S2 \ R where 7i a chooses a successor r' 
with Val{r') > Val(r). But because r R, there must also be a successor r" such that Val(r") = Val(r). 
We modify % a to a 7i' a , which chooses for all such r the value-preserving successor instead of r 1 , and 
continues as a 1/2 • (Val(r') - Val{r)) -optimal strategy in . Clearly, Pf'^[W] < Pf'^W] in Sf, and 
since %' a is also a strategy in Sf', © is true in <&' as well. 

Finally, we prove that if (O is true in <S then it is true in ( S' . Let K be a strategy in ( S' . Fix the choices 
of % in Sf' outside of 7? to define a game C S % . By Corollary [T] §^ has the same values as ^. Thus, optimal 
strategies of Max in ( S n exist, because c S n satisfies (Q), and only choose edges preserving the value in 
Sf. Consider the strategy c % witnessing © in Sf. We now define a strategy o' n in it copies moves 
of On in unless c % chooses some value-decreasing edge. In that case, instead of following o n , o' K 
immediately switches to some optimal strategy for c S n . Since the values in C S % are the same as in W, this 
only increases the probability of winning, thus F?"*\W] > Wf"*\W] . □ 

Lemma 9. Assume that satisfies ((*]). For all s G S such that Val(s) > 0, if 

Vofor Max : 3n c for Min : Pf •*» [W] < V^Z(j) , (8) 



371 for Min : for Max : Pf < Va/(j). 



(9) 
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Figure 2: Left: A game, §f , where player 
Max (□) does not have optimal strate- 
gies. All stochastic (O) states have uni- 
form distribution on outgoing transitions. 
Right: A One Counter description of . 
Signed numbers represent counter incre- 
ments. 



Proof. By Lemma[U if ([U) =>■ © in ( S' then the implication holds in as well, and if 'S satisfies (0) 
then so does Sf'. Thus we focus on instead. We describe the modification of §f', called Jtff, where we 
cut off R. By Att(R) we denote the set of all states, r, such that in <&' Max has a strategy, a, such that for 
all n for Min, Ff' 71 [Reach R] > 0. Further, we consider the edge relation , which is simply the relation 
i — y without edges leading to states from Att(R). 

We fix some s, Val(s) > 0, satisfying ([U), and by S' we denote the subset of all r G S to which there 
is a path from s in the graph (S, "—^ ). Consider a game graph, H = (£', <— >■ , 8), inheriting the partition 
of states from G. The edge relation is the ^ defined above, only restricted to S' x 5'. Observe that if 
r G SqUS'j and r— >r' for some r' € 5, then r 1 G 5'. This is because r ^ Att{R) implies r 1 ^ Att{R) if r is 
not owned by Min. Similarly, for all r G S 2 there is a r' G 5" such that rHr 1 '. Thus 5, restricted to 5', is 
still a valid probability weight function, and H is a valid game graph. We abuse the letter W to denote a 
restriction of W to J4? , and define a game = (H,W). 

Because all edges leaving states from 52 \R were value-preserving in Corollary [Qyields that the 
values stay the same in <ffl as they were in 'S' , and there is an optimal strategy, % for Min in This is 
also a strategy for <£' , and because the choices of Player Max were not affected when reducing ( S' to Jf 7 , 
we obtain, that for all a for Max we have Pf - n [W] < Val(s) both in ^ and in This proves ©. □ 



5 One Counter Games 

One Counter stochastic games (OC-SSGs), see, e.g., El|2l|4l, are games played on transition graphs of 
one-counter automata. Such automata have a finite control-state unit, Q, and a set of rules, which are 
triples of the form (r,k,s) with r,s G Q and k G {— 1,0, +1}. States of an OC-SSG are then of the form 
s n where s G Q is a control state, and n > is an integer, representing the counter value. Transitions are 
generated by setting ri^Sj if i > and there is a rule (r,j — i,s). Moreover, states with counter are 
made absorbing, so— >$o, to reflect that the system halts with the empty counter. The partition of states 
is induced by a partition of Q, and the probabilities of transitions out of stochastic states, are induced by 
probabilities on rules. OC-SSGs come with an implicit reachability objective, the set to be reached is the 
set {so [ s G Q} of states with counter 0. Because the system halts in we also call this a termination 
winning condition. 
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Example 2. In the right-hand part of Figure |2] we give the one-counter automaton with the set Q = 
{s,u,d,r,z,t} of control states. An unlabelled edge, like s — >u, represents a 0-rule, e.g., (s,0,u). A label 
(±1) represents the counter change, e.g., the loop t— >t represents (?, + l,f). The square-state s belongs 
to Max, other states are stochastic. The distributions on outgoing transitions are implicitly uniform in 
this example. In the left-hand part is the generated OC-SSG. Grey states are to be reached. Later in this 
section we will show that Val(sj) = , but no strategy of Player Max is optimal in Observe that 

1/2 = lim^oo |jif is an accumulation point in the set of all values. 

Note that every OC-SSG has bounded out-degree and in-degree, in particular it is finitely branching. 
Thus Min has always optimal strategies in OC-SSGs. However, they may not always satisfy (0), and 
Example|2]shows that in OC-SSGs, Max may have no optimal strategies. On the other hand, the structure 
of the accumulation points in the set of all values is well understood for OC-SSGs. To describe it, we 
need to introduce another winning objective. 

In OC-SSGs there is an implicit boundary on the counter value - if it reaches zero, the system halts. 
However, we may also interpret the one-counter automaton as a directed graph on Q, with the rules as 
edges with rewards. This way we obtain a finite game graph. Accumulating those rewards along a run in 
such a game graph then corresponds to observing the counter in the OC-SSG, with the exception that the 
counter does not stop in and may get negative. Adding the winning condition (for Max) that the liminf 
of the accumulated rewards be — oo, we just defined (Liminf = — oo)-games. 

In El @1 it was shown that both players always have pure and memoryless optimal strategies in 
(Liminf = — oo)-games, and the optimal value is always rational and computable. Observe that the ter- 
mination values, Val(s n ), for a fixed s € Q, are non-increasing with increasing n. Thus their limit exists, 
and, in fact, it is an easy exercise to employ the results of ||2][4l to prove that the (Liminf = — oo)-value 
of a control state, s, equals lim„^oo Val(s n ). Intuitively this is because, with increasing the initial counter, 
n, the objective of reaching becomes more and more similar to the (Liminf = — oo) objective. Thus the 
set of (Liminf = — oo)-values of all states s € Q contains the set of all accumulation points of the termi- 
nation values. It is also possible to decide in time polynomial in \Q\ whether a (Liminf = — oo)-value, V, 
actually is an accumulation point, i.e., whether for all states, s, with (Liminf = — oo)-value v the limit of 
termination values stabilises after finitely many steps. 

Corollary 2. Let & be an OC-SSG with the set Q of control states. If for every s £ <2 the (Liminf = — oo)- 
value of s is 1 or 0, then Player Max has an optimal strategy for termination in <3 '. 

Proof. The limits of termination values are approached from above, because Val(s n ) > Val(s n+ \) for all 
s € Q and all n > 0. Thus, 1 is not an accumulation point, and we may apply Theorem [T] □ 

Note that the class of OC-SSGs satisfying the condition of Corollary [^involves all OC-SSGs where 
the graph of rules is strongly connected, and one of the players is missing. This is because (Liminf = — oo) 
is a prefix independent objective, and the strong connectivity allows the only player to reach each control 
state almost surely, thus all control states have the same (Liminf = — oo)-value. By results of ifTTl Theo- 
rem 3.2], such a common value can only be or 1. In particular, Corollary [2] covers both the Solvency 
games, see [T], and their maximizing variant. 

In Solvency games, a gambler has an initial positive amount of money, and in each step chooses 
one of finitely many actions. Each action is associated with a distribution on a finite set of integers. A 
number from this set is then sampled, and added to the sum of money owned by the gambler (it can be, 
however, negative), and the process ends only when the wealth becomes < 0. This is easily modelled 
by one-player OC-SSGs (see Hi), and these have strongly connected graphs of rules, because the only 
state where the gambler chooses the action, is reachable from all other states. The natural scenario is, 
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obviously, with Player Min for these games, and there the existence of optimal strategies follows from 
the finite branching. However, the dual situation, with Player Max, is theoretically interesting as well, 
and we are not aware of any result prior to our Corollary indicating the existence of optimal strategies 
for Player Max. 

5.1 Analysis of Example |2] 

Consider an arbitrary n > 1. It is easy to see that Val{r n ) = I. Observe that starting in u n , s n+ \ is visited 
with probability TT=o^- i 2 ' = f> an ^ s n-i with probability |. 

Lemma 10. For the unique strategy, O, not using transitions s n ^-r„, n> 1, we have P£[W] = 2~ ! . 

Proof. Clearly F^[W] = 1 = 2~°. Further, the assignment x := P£ [W] is the least non-negative solution of 
the equation x = § + y, see, e.g., [9 , Theorem 3.4] or iTTOl Theorem 1], which is j. Solving the recurrence 
FfJW] = | • P£_, [W] + | • P£ +1 [W] , given the initial conditions for i = 0, 1, yields P£[W] = 2 -i . □ 

Lemma 11. Val(s\) = |. 

Proof. First we prove Val(s\) > I. For any n consider the memoryless strategy, o n , given by a„(s,-)(w,) = 
1 if i < n and (T„(ji)(r ; -) = 1 if i > n. Set pt := P^'[Reach Sj] . Observe that /?,• does not change if we define 
it using any o n with n > i, and that \ — p { = P^ n [W A-iReach si\ for n > i. Moreover, p\ = 1 and 

Pi+\ '■= § • (Pi + (1 — Pi) • Pi+i ) • This uniquely determines that pt = Finally, observe that P^"[W] = 
(l-Pn)+Pn ■ \, thus Val{si) >lim„^ o(l -p„)+ p n -\ = \- 

Now we prove that Val(s\ ) < | by proving P^ [W] < | for all a. Consider the following probabilities: 
Pa '■= P^ [W A -iReach some rj\, pi, := [W A Reach some rj\, p c '.= P^ [Reach some rj\. Clearly pt = 
tj. Due to LemmafTOl applied to i = 1 we also have that p a < ^. Finally, p a +p c < 1 since the events are 
disjoint. We conclude that P£ [W] = p a + p fc < p a + \ ■ ( 1 - p a ) = \ ■ p a + \ < f . □ 

Lemma 12. Vfc/(jj) = f^f /or a// i > 0. 

Proof. The case i = Ois trivial, and / = 1 is LemmaQTJ Solving the recurrence Val(sj) = | • VfoZ^-i) + 
^ ■ Vfo/(s; + i), given the initial conditions for i = 0, 1, yields Val(sj) = ^f. □ 

In particular, for all i > 1, Val(si) > Va/(r,), thus no optimal strategy may use transitions s n ^-r n , n > 
1. By Lemma [lOl there are no optimal strategies in s ( . 
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